Quantum Algorithm for Numerical Energy Gradient Calculations at the Full Configuration Interaction Level of Theory

A Bayesian phase difference estimation (BPDE) algorithm allows us to compute the energy gap of two electronic states of a given Hamiltonian directly by utilizing the quantum superposition of their wave functions. Here we report an extension of the BPDE algorithm to the direct calculation of the energy difference of two molecular geometries. We apply the BPDE algorithm for the calculation of numerical energy gradients based on the two-point finite-difference method, enabling us to execute geometry optimization of one-dimensional molecules at the full-CI level on a quantum computer. Results of numerical quantum circuit simulations of the geometry optimization of the H2 molecule with the STO-3G and 6-31G basis sets, the LiH and BeH2 molecules at the full-CI/STO-3G level, and the N2 molecule at the CASCI(6e,6o)/6-311G* level are given.


Definitions of quantum gates and quantum circuits
In quantum computers, qubits can be in an arbitrary superposition of the |0⟩ and |1⟩ states, as given in eq (S1).
The quantum state | ⟩ in eq (S1) can also be represented by a matrix as follows: Quantum gates acting on one qubit can be expressed by a (2 × 2) unitary matrix and the quantum state after the quantum gate application can be calculated by matrix algebra. For example, the quantum state after the application of an Hadamard (Had) gate can be calculated as in eq (S4).
The circuit symbols and matrix representations of the quantum gates frequently used for quantum chemical calculations are summarized in Table S1. As illustratively described in the controlled-Rz gates in Table S1, the conditional operation is executed if and only if the controlled qubit is in the |1⟩ (|0⟩) state when the circuit symbol for the controlled qubit is a close (open) circle. In the quantum circuit, the horizontal lines denote a qubit or N-qubits, and quantum gates are applied to the qubits from left to right order. Pauli-X

Implementation of the BPDE-based numerical energy gradient calculations
The BPDE-based numerical energy gradient calculation utilizes the quantum circuit given in Figure   3a in the main text, and two subsequent controlled-time evolution operations are implemented by using the quantum circuit depicted in Figure 3c. The wave function |⟩ used as the input in the BPDE is described as the linear combination of eigenfunctions, as in eq (S5) and (S6) for geometry A and B, respectively. As discussed in the main text, we used | (A) ⟩ = | (B) ⟩ = |⟩ for the finite difference-based numerical energy gradient calculations.
Here, |j (A) ⟩ and |k (B) ⟩ are the j-th eigenfunction at geometry A and the k-th eigenfunction of geometry B, respectively. Assuming eq (S5) and (S6), the probability of obtaining the |0⟩ state in the measurement of the ancillary qubit in the quantum circuit in Figure 3a is calculated as in eq (S7).
Here, Ej (A) and Ek (B) are the energy eigenvalues of the j-th electronic state at geometry A and the k-th In the calculations of numerical energy gradients with respect to nuclear coordinates, we adopted In the step (2), we set the initial mean of the prior distribution as 0 Hartree. The standard deviation of the prior distribution is defined as 1 Hartree. Note that the standard deviation determines the width of the search area in Bayesian inference, and therefore an initial value of the standard deviation must be large enough so that an actual value of E locates in the range between ( − ) and ( + ). It should be also noted that the initial value of the standard deviation of the prior distribution is substantially large for the application of the gradient computation based on finite difference method, and smaller values such as 0.01 Hartree (≈ 6.27 kcal mol) may be enough.
In the step (3), the evolution time length t is set as in eq (S8). This condition is derived empirically so as to the likelihood function (0|Δ ; ) has a sufficiently large gradient and (0|Δ ; ) has a single maximum in the range of − ≤ ≤ + .
In the step (4), we used m = 21 and R = 50000. We set the R value substantially larger than that used in the previous BPDE-based energy gap and full-CI energy calculations (R = 1000), 1,2 because we have to deal with very small energy differences in the finite difference-based gradient computations.
Preliminary simulation results of the m and R dependences on the gradient values are given in Section 4 of this Supporting Information.
To execute the BPDE-based numerical energy gradient calculations, the molecular Hamiltonians HA and HB are transformed to the qubit Hamiltonian consisting of a linear combination of Pauli strings as in eq (11) and (12) in the main text, by using Jordan-Wigner transformation. 3

MO integrals appearing
in the Hamiltonian were prepared by utilizing our own AO → MO integral transformation program, in conjunction with the one-and two-electron atomic orbital integrals computed by using GAMESS-US program package, 4 or computed by using PySCF program. 5 The quantum circuit for the BPDE algorithm contains the controlled-time evolution operator UA = exp(−iHAt) and UB = exp(−iHBt). The quantum circuit corresponding to the controlled-UA and controlled-UB operators is constructed using the technique illustrated in Figure 3c in the main text, with the second-order Trotter decomposition given in eq (S9) with the time for a single Trotter step t/N = 0.5. It is known that Trotter decomposition error depends on the ordering of Trotterized terms to be applied. 6,7 In this work we used the magnitude ordering, in which Pauli strings are ordered by the absolute value of the sum of the norms of Pauli strings |uj| + |vj|. 7
Because both the prior distribution (Δ ) and the likelihood function (0|Δ ; ) are given as Gaussian functions, we can easily calculate the posterior distribution.
The energy threshold used for the convergence check in the step (6) was set to be EThre = 0.0005 Hartree for all simulations. This threshold value is smaller than that used in the previous BPDE-based energy gap and total energy calculations, because we have to discuss small energy differences in the finite difference-based gradient calculations. The numerical quantum circuit simulation program was developed by utilizing OpenFermion 8 and Cirq 9 libraries in Python.

Computational conditions for the BPDE-based numerical energy gradient calculations and geometry optimizations
In this study the geometry optimizations of H2, LiH, BeH2, and N2 molecules were performed using the numerical energy gradients computed from the BPDE quantum circuit simulations in conjunction with the gradient-only optimization algorithm. We employed the full-CI/STO-3G method in the geometry optimization of H2, LiH, and BeH2 molecules. The number of qubits used for wave function encoding is 4, 12, and 14 for H2, LiH, and BeH2, respectively. For H2 molecule we also examined the geometry optimization at the full-CI/6-31G level using 8 qubits for wave function storage. In the geometry optimization of N2 molecule we used the (6e,6o) active space consisting of valence /* and (S11) in conjunction with the occupation number of the lowest unoccupied natural orbitals nLUNO. 10 Then, the two-configurational wave function is constructed by using the quantum circuit depicted in Figure S1 and use it as the input wave function in the BPDE.  Figure 4 in the main text and the geometry optimization of N2 molecule at the CASCI(6e,6o)/6-311G(d) level of theory, the BS-UHF calculations were performed by using GAMESS-US program package. One-and two-electron MO integrals were prepared using in-house AO → MO integral transformation program, in conjunction with the one-and two-electron AO integrals computed by using GAMESS-US software. In the geometry optimizations of H2, LiH, and BeH2 molecules the RHF calculations were performed using PySCF program package, and corresponding one-and twoelectron MO integrals were obtained from PySCF, by using OpenFermion-PySCF library. Because the likelihood function in the Bayesian optimization is calculated based on the finite number of sampling, the gradient values computed using the BPDE algorithm contain statistical errors. In the gradient calculations in Figure 4 in the main text, the numerical quantum circuit simulations were performed five times at each geometry. We confirmed that the standard deviations of five runs are less than 0.0006 Hartree/Å for all geometries being investigated.
In the geometry optimizations based on the numerical energy gradients computed from the BPDE algorithm, we used a gradient-only algorithm 11 for the molecular geometry update. In the gradientonly optimization algorithm the three-point bisection method is used instead of the golden-section S8 search to find an extremum. Detailed procedures for the gradient-only geometry optimization are as follows. (1) Calculate the gradient at the initial geometry. (2) Calculate the gradient at the geometry (x − lg), where x is the initial geometry, g is the gradient vector at the initial geometry. The line search iterations l are incremented until the sign of the gradient changes. (3) Refine the location of the sign change using the three-point bisection method until displacement of atoms becomes smaller than the threshold value RThre. In the present study we used RThre = 0.002 Å.
To investigate accuracy of the optimized geometries computed from the BPDE-based numerical energy gradients and gradient-only optimization algorithm, we also executed conventional geometry optimizations using GAMESS-US program package.   Figure S2. The simulations were executed five times for each geometry and averaged out, and the standard deviations of five runs were plotted. Figure S2 indicates that the smaller r value gave the larger errors of the dE/dR values. Especially, if r = 0.0001 Å is adopted the dE/dR values exhibit large standard deviations. This is because we used the same threshold value for the convergence check in the Bayesian optimization (EThre = 0.0005 Hartree) for all simulations, but this value is too small to predict the energy difference accurately when a small r value is used. We have to adopt tighter threshold values to calculate the gradient accurately for the smaller r values. Figure   S2 indicates that r = 0.0025 or 0.0050 Å are suitable from both the viewpoints of mean values and standard deviations. We expect that r = 0.0050 Å may suffer from inaccuracy of the gradient values when strong anharmonicity is present on the potential energy surface. Thus, we adopted r = 0.0025 Å for the finite difference value for all simulations.
Results of the numerical simulation of the energy gradients of H2 using difference number of quantum circuit repetitions are summarized in Figure S3. We used m = 21 and t/N = 0.5 for these simulations. As naturally expected, the smaller number of quantum circuit repetitions gave the larger standard deviations, although the mean of dE/dR values did not change so much. Figure S4 summarizes the simulation results by changing the number of sampling points, while the total number of measurements is being approximately fixed (mR ~ constant). The time step for the single Trotter step was set to be t/N = 0.5. The results in Figure S4 imply that accuracy of the numerical energy gradient values are retained as long as the total number of measurements is the same.
To investigate the effect of Trotter decomposition errors, we performed numerical simulations with different t/N values (t/N = 0.1, 0.5, and 1.0). m and R were set to be 21 and 50000, respectively. The results are plotted in Figure S5

S10
and therefore we concluded that the Trotter decomposition error is sufficiently small for t/N = 0.5 and shorter.
To check the input wave function dependence on the quality of the numerical energy gradient values in H2 molecule, we performed the quantum circuit simulations using the RHF/STO-3G wave function as the input for all bond lengths. The results are summarized in Figure S6. Note that in Figure 5

Results of the geometry optimizations of H2 molecule based on the BPDE-based numerical energy gradients and gradient-only optimization algorithm
In the geometry optimization of H2 molecule at the full-CI/STO-3G level of theory, we examined six different initial geometries with R(H-H) = 0.5, 0.6, 0.7, 0.8, 0.9, and 1.0 Å, and geometry optimizations were performed five times for every geometry. The results of the geometry optimizations were plotted in Figure 5 in the main text, and the optimized bond lengths are summarized in Table S2. As discussed in the main text, the geometry optimization converged after 5-10 iterations, and the optimized bond length is calculated to be R(H-H) = 0.736381 ± 0.000876 Å. The optimized bond length at the full-CI/STO-3G level computed by using GAMESS-US is R(H-H) = 0.734868 Å. In order to disclose the origin of the difference of the optimized bond length between the BPDE-based quantum circuit simulations and traditional quantum chemical calculations, we have carried out geometry optimization using the gradient-only optimization algorithm in conjunction with numerical energy gradients computed from two separate full-CI/STO-3G calculations using GAMESS-US software. The same criteria for geometry optimization convergence were adopted. By setting the initial geometry as R(H-H) = 1.0 Å, the geometry optimization converged after 9 iterations, and the optimized bond length is R(H-H) = 0.734601 Å. Therefore, using the numerical energy gradient itself is not responsible for the deviation.
As discussed in Figure 4 in the main text, the BPDE-based numerical energy gradient tends to negatively shift around the equilibrium geometry. As a result, the BPDE-based gradient predicts the equilibrium bond distance slightly longer than the numerical energy gradient computed from two separate full-CI calculations. As discussed in the previous section, increasing Trotter slices did not improve the gradient value so much, and therefore quality of the input wave function is majorly responsible for the deviation.
We expect that using more sophisticated wave functions such as wave functions prepared by adopting adiabatic state preparation 12 as the input in BPDE will improve the gradient values and hence optimized geometries.